Load the necessary libraries
library(car) # for regression diagnostics
library(broom) # for tidy output
library(ggfortify) # for model diagnostics
library(sjPlot) # for outputs
library(knitr) # for kable
library(effects) # for partial effects plots
library(emmeans) # for estimating marginal means
library(MASS) # for glm.nb
library(MuMIn) # for AICc
library(tidyverse) # for data wrangling
library(brms)
library(broom.mixed)
library(tidybayes)
library(bayesplot)
library(standist) # for visualizing distributions
library(rstanarm)
library(ggeffects)
library(rstan)
library(DHARMa)
Crab_shrimp_coral
To investigate synergistic coral defence by mutualist crustaceans, Mckeon et al. (2012) conducted an aquaria experiment in which colonies of a coral species were placed in a tank along with a predatory sea star and one of four symbiont combinations:
The experiments were conducted in a large octagonal flow-through seawater tank that was partitioned into eight sections, which thereby permitted two of each of the four symbiont combinations to be observed concurrently. The tank was left overnight and in the morning, the presence of feeding scars on each coral colony was scored as evidence of predation. The experiments were repeated ten times, each time with fresh coral colonies, sea stars and symbiont.
The ten experimental times represent blocks (random effects) within which the symbiont type (fixed effect) are nested.
mckeon <- read_csv("../public/data/mckeon.csv", trim_ws = TRUE)
mckeon %>% glimpse()
## Rows: 80
## Columns: 3
## $ BLOCK <dbl> 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10…
## $ PREDATION <dbl> 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, …
## $ SYMBIONT <chr> "none", "none", "none", "none", "none", "none", "none", "non…
Since the response here is the presence or absence of predation (feeding scars), a binomial distribution is appropriate.
We need to make sure that the categorical variable and the random effect are defined as factors. When doing so, it might be valuable to rearrange the order of the fixed effect (SYMBIONT) such that the none group is considered the first group. This way, the other levels will all naturally be compared to this level (hence it will be treated as a reference of control group).
mckeon <- mckeon %>%
mutate(
BLOCK = factor(BLOCK),
SYMBIONT = factor(SYMBIONT, levels = c("none", "crabs", "shrimp", "both"))
)
Model formula: \[ y_i \sim{} \mathcal{N}(n, p_i)\\ ln\left(\frac{p_i}{1-p_1}\right) =\boldsymbol{\beta} \bf{X_i} + \boldsymbol{\gamma} \bf{Z_i} \]
where \(\boldsymbol{\beta}\) and \(\boldsymbol{\gamma}\) are vectors of the fixed and random effects parameters respectively and \(\bf{X}\) is the model matrix representing the overall intercept and effects of symbionts on the probability of the colony experiencing predation. \(\bf{Z}\) represents a cell means model matrix for the random intercepts associated with individual coral colonies.
ggplot(mckeon, aes(y = PREDATION, x = SYMBIONT)) +
geom_point(position = position_jitter(width = 0.2, height = 0)) +
facet_wrap(~BLOCK)
In rstanarm, the default priors are designed to be weakly informative. They are chosen to provide moderate regularisation (to help prevent over-fitting) and help stabilise the computations.
mckeon.rstanarm <- stan_glmer(PREDATION ~ SYMBIONT + (1 | BLOCK),
data = mckeon,
family = binomial(link = "logit"),
iter = 5000,
warmup = 2000,
chains = 3,
thin = 5,
refresh = 0,
cores = 3
)
mckeon.rstanarm %>% prior_summary()
## Priors for model '.'
## ------
## Intercept (after predictors centered)
## ~ normal(location = 0, scale = 2.5)
##
## Coefficients
## Specified prior:
## ~ normal(location = [0,0,0], scale = [2.5,2.5,2.5])
## Adjusted prior:
## ~ normal(location = [0,0,0], scale = [5.74,5.74,5.74])
##
## Covariance
## ~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)
## ------
## See help('prior_summary.stanreg') for more details
This tells us:
2.5 * sd(mckeon$PREDATION)
## [1] 1.217943
model.matrix(~SYMBIONT, data = mckeon) %>%
apply(2, sd) %>%
(function(x) 2.5 / x)
## (Intercept) SYMBIONTcrabs SYMBIONTshrimp SYMBIONTboth
## Inf 5.737305 5.737305 5.737305
Lets now run with priors only so that we can explore the range of values they allow in the posteriors.
mckeon.rstanarm1 <- update(mckeon.rstanarm, prior_PD = TRUE)
ggpredict(mckeon.rstanarm1) %>% plot(add.data = TRUE, jitter = c(0.5, 0))
## $SYMBIONT
Conclusions:
The following link provides some guidance about defining priors. [https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations]
When defining our own priors, we typically do not want them to be scaled.
If we wanted to define our own priors that were less vague, yet still not likely to bias the outcomes, we could try the following priors (mainly plucked out of thin air):
I will also overlay the raw data for comparison.
mckeon.rstanarm2 <- stan_glmer(PREDATION ~ SYMBIONT + (1 | BLOCK),
data = mckeon,
family = binomial(link = "logit"),
prior_intercept = normal(0, 2, autoscale = FALSE),
prior = normal(0, 10, autoscale = FALSE),
prior_covariance = decov(1, 1, 1, 1),
prior_PD = TRUE,
iter = 5000,
warmup = 1000,
chains = 3,
thin = 5,
refresh = 0
)
mckeon.rstanarm2 %>%
ggpredict(~SYMBIONT) %>%
plot(add.data = TRUE, jitter = c(0.5, 0))
In brms, the default priors are designed to be weakly informative. They are chosen to provide moderate regularisation (to help prevent over fitting) and help stabilise the computations.
Unlike rstanarm, brms models must be compiled before they start sampling. For most models, the compilation of the stan code takes around 45 seconds.
mckeon.form <- bf(PREDATION | trials(1) ~ SYMBIONT + (1|BLOCK),
family=binomial(link='logit'))
options(width=150)
mckeon.form %>% get_prior(data = mckeon)
## prior class coef group resp dpar nlpar bound source
## (flat) b default
## (flat) b SYMBIONTboth (vectorized)
## (flat) b SYMBIONTcrabs (vectorized)
## (flat) b SYMBIONTshrimp (vectorized)
## student_t(3, 0, 2.5) Intercept default
## student_t(3, 0, 2.5) sd default
## student_t(3, 0, 2.5) sd BLOCK (vectorized)
## student_t(3, 0, 2.5) sd Intercept BLOCK (vectorized)
options(width=80)
The following link provides some guidance about defining priors. [https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations]
When defining our own priors, we typically do not want them to be scaled.
If we wanted to define our own priors that were less vague, yet still not likely to bias the outcomes, we could try the following priors (mainly plucked out of thin air):
Note, for hierarchical models, the model will tend to want to have a large \(sigma\) in order to fit the data better. It is a good idea to regularise this tendency by applying a prior that has most mass around zero. Suitable candidates include:
I will also overlay the raw data for comparison.
mckeon %>%
group_by(SYMBIONT) %>%
summarise(
mean(PREDATION),
var(PREDATION)
)
standist::visualize("normal(0,2)", xlim = c(0, 200))
standist::visualize("student_t(3, 0, 2.5)",
"gamma(2,0.5)",
"cauchy(0,1)",
xlim = c(-10, 25)
)
priors <- prior(normal(0, 2), class = "Intercept") +
prior(normal(0, 10), class = "b") +
prior(cauchy(0, 1), class = "sd")
mckeon.form <- bf(PREDATION | trials(1) ~ SYMBIONT + (1 | BLOCK),
family = binomial(link = "logit")
)
mckeon.brm2 <- brm(mckeon.form,
data = mckeon,
prior = priors,
sample_prior = "yes",
iter = 5000,
warmup = 1000,
chains = 3,
thin = 5,
refresh = 0
)
mckeon.form <- bf(PREDATION | trials(1) ~ SYMBIONT + (SYMBIONT | BLOCK),
family = binomial(link = "logit")
)
mckeon.brm3 <- brm(mckeon.form,
data = mckeon,
prior = priors,
sample_prior = "yes",
iter = 5000,
warmup = 1000,
chains = 3,
thin = 5,
refresh = 0,
control = list(adapt_delta = 0.99)
)
(l.1 <- mckeon.brm2 %>% loo())
##
## Computed from 2400 by 80 log-likelihood matrix
##
## Estimate SE
## elpd_loo -28.4 9.7
## p_loo 10.4 4.3
## looic 56.8 19.5
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 67 83.8% 667
## (0.5, 0.7] (ok) 11 13.8% 295
## (0.7, 1] (bad) 1 1.2% 112
## (1, Inf) (very bad) 1 1.2% 21
## See help('pareto-k-diagnostic') for details.
(l.2 <- mckeon.brm3 %>% loo())
##
## Computed from 2400 by 80 log-likelihood matrix
##
## Estimate SE
## elpd_loo -26.8 8.6
## p_loo 13.9 5.5
## looic 53.5 17.2
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 58 72.5% 278
## (0.5, 0.7] (ok) 14 17.5% 1130
## (0.7, 1] (bad) 6 7.5% 6
## (1, Inf) (very bad) 2 2.5% 27
## See help('pareto-k-diagnostic') for details.
loo_compare(l.1, l.2)
## elpd_diff se_diff
## . 0.0 0.0
## . -1.6 2.4
mckeon.brm3 %>% get_variables()
## [1] "b_Intercept"
## [2] "b_SYMBIONTcrabs"
## [3] "b_SYMBIONTshrimp"
## [4] "b_SYMBIONTboth"
## [5] "sd_BLOCK__Intercept"
## [6] "sd_BLOCK__SYMBIONTcrabs"
## [7] "sd_BLOCK__SYMBIONTshrimp"
## [8] "sd_BLOCK__SYMBIONTboth"
## [9] "cor_BLOCK__Intercept__SYMBIONTcrabs"
## [10] "cor_BLOCK__Intercept__SYMBIONTshrimp"
## [11] "cor_BLOCK__SYMBIONTcrabs__SYMBIONTshrimp"
## [12] "cor_BLOCK__Intercept__SYMBIONTboth"
## [13] "cor_BLOCK__SYMBIONTcrabs__SYMBIONTboth"
## [14] "cor_BLOCK__SYMBIONTshrimp__SYMBIONTboth"
## [15] "r_BLOCK[1,Intercept]"
## [16] "r_BLOCK[2,Intercept]"
## [17] "r_BLOCK[3,Intercept]"
## [18] "r_BLOCK[4,Intercept]"
## [19] "r_BLOCK[5,Intercept]"
## [20] "r_BLOCK[6,Intercept]"
## [21] "r_BLOCK[7,Intercept]"
## [22] "r_BLOCK[8,Intercept]"
## [23] "r_BLOCK[9,Intercept]"
## [24] "r_BLOCK[10,Intercept]"
## [25] "r_BLOCK[1,SYMBIONTcrabs]"
## [26] "r_BLOCK[2,SYMBIONTcrabs]"
## [27] "r_BLOCK[3,SYMBIONTcrabs]"
## [28] "r_BLOCK[4,SYMBIONTcrabs]"
## [29] "r_BLOCK[5,SYMBIONTcrabs]"
## [30] "r_BLOCK[6,SYMBIONTcrabs]"
## [31] "r_BLOCK[7,SYMBIONTcrabs]"
## [32] "r_BLOCK[8,SYMBIONTcrabs]"
## [33] "r_BLOCK[9,SYMBIONTcrabs]"
## [34] "r_BLOCK[10,SYMBIONTcrabs]"
## [35] "r_BLOCK[1,SYMBIONTshrimp]"
## [36] "r_BLOCK[2,SYMBIONTshrimp]"
## [37] "r_BLOCK[3,SYMBIONTshrimp]"
## [38] "r_BLOCK[4,SYMBIONTshrimp]"
## [39] "r_BLOCK[5,SYMBIONTshrimp]"
## [40] "r_BLOCK[6,SYMBIONTshrimp]"
## [41] "r_BLOCK[7,SYMBIONTshrimp]"
## [42] "r_BLOCK[8,SYMBIONTshrimp]"
## [43] "r_BLOCK[9,SYMBIONTshrimp]"
## [44] "r_BLOCK[10,SYMBIONTshrimp]"
## [45] "r_BLOCK[1,SYMBIONTboth]"
## [46] "r_BLOCK[2,SYMBIONTboth]"
## [47] "r_BLOCK[3,SYMBIONTboth]"
## [48] "r_BLOCK[4,SYMBIONTboth]"
## [49] "r_BLOCK[5,SYMBIONTboth]"
## [50] "r_BLOCK[6,SYMBIONTboth]"
## [51] "r_BLOCK[7,SYMBIONTboth]"
## [52] "r_BLOCK[8,SYMBIONTboth]"
## [53] "r_BLOCK[9,SYMBIONTboth]"
## [54] "r_BLOCK[10,SYMBIONTboth]"
## [55] "prior_Intercept"
## [56] "prior_b"
## [57] "prior_sd_BLOCK"
## [58] "prior_cor_BLOCK"
## [59] "lp__"
## [60] "accept_stat__"
## [61] "stepsize__"
## [62] "treedepth__"
## [63] "n_leapfrog__"
## [64] "divergent__"
## [65] "energy__"
mckeon.brm3 %>%
hypothesis("SYMBIONTcrabs=0") %>%
plot()
mckeon.brm3 %>%
posterior_samples() %>%
dplyr::select(-`lp__`) %>%
pivot_longer(everything(), names_to = "key") %>%
filter(!str_detect(key, "^r")) %>%
mutate(
Type = ifelse(str_detect(key, "prior"), "Prior", "Posterior"),
Class = case_when(
str_detect(key, "(^b|^prior).*Intercept$") ~ "Intercept",
str_detect(key, "b_SYMBIONT.*|prior_b") ~ "TREATMENT",
str_detect(key, "sd") ~ "sd",
str_detect(key, "^cor|prior_cor") ~ "cor",
str_detect(key, "sigma") ~ "sigma"
),
Par = str_replace(key, "b_", "")
) %>%
ggplot(aes(x = Type, y = value, color = Par)) +
stat_pointinterval(position = position_dodge(), show.legend = FALSE) +
facet_wrap(~Class, scales = "free")
In addition to the regular model diagnostics checking, for Bayesian analyses, it is also necessary to explore the MCMC sampling diagnostics to be sure that the chains are well mixed and have converged on a stable posterior.
There are a wide variety of tests that range from the big picture, overall chain characteristics to the very specific detailed tests that allow the experienced modeller to drill down to the very fine details of the chain behaviour. Furthermore, there are a multitude of packages and approaches for exploring these diagnostics.
The bayesplot package offers a range of MCMC diagnostics as well as Posterior Probability Checks (PPC), all of which have a convenient plot() interface. Lets start with the MCMC diagnostics.
available_mcmc()
## bayesplot MCMC module:
## mcmc_acf
## mcmc_acf_bar
## mcmc_areas
## mcmc_areas_data
## mcmc_areas_ridges
## mcmc_areas_ridges_data
## mcmc_combo
## mcmc_dens
## mcmc_dens_chains
## mcmc_dens_chains_data
## mcmc_dens_overlay
## mcmc_hex
## mcmc_hist
## mcmc_hist_by_chain
## mcmc_intervals
## mcmc_intervals_data
## mcmc_neff
## mcmc_neff_data
## mcmc_neff_hist
## mcmc_nuts_acceptance
## mcmc_nuts_divergence
## mcmc_nuts_energy
## mcmc_nuts_stepsize
## mcmc_nuts_treedepth
## mcmc_pairs
## mcmc_parcoord
## mcmc_parcoord_data
## mcmc_rank_hist
## mcmc_rank_overlay
## mcmc_recover_hist
## mcmc_recover_intervals
## mcmc_recover_scatter
## mcmc_rhat
## mcmc_rhat_data
## mcmc_rhat_hist
## mcmc_scatter
## mcmc_trace
## mcmc_trace_data
## mcmc_trace_highlight
## mcmc_violin
Of these, we will focus on:
pars <- mckeon.brm3 %>% get_variables()
pars <- pars %>%
str_extract("^b.Intercept|^b_SYMBIONT.*|[sS]igma|^sd.*") %>%
na.omit()
pars
## [1] "b_Intercept" "b_SYMBIONTcrabs"
## [3] "b_SYMBIONTshrimp" "b_SYMBIONTboth"
## [5] "sd_BLOCK__Intercept" "sd_BLOCK__SYMBIONTcrabs"
## [7] "sd_BLOCK__SYMBIONTshrimp" "sd_BLOCK__SYMBIONTboth"
## attr(,"na.action")
## [1] 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
## [26] 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
## [51] 59 60 61 62 63 64 65
## attr(,"class")
## [1] "omit"
mckeon.brm3 %>% mcmc_plot(type = "trace", pars = pars)
The chains appear well mixed and very similar
mckeon.brm3 %>% mcmc_plot(type = "acf_bar", pars = pars)
There is no evidence of auto-correlation in the MCMC samples
mckeon.brm3 %>% mcmc_plot(type = "rhat_hist")
All Rhat values are below 1.05, suggesting the chains have converged.
neff_hist (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
mckeon.brm3 %>% mcmc_plot(type = "neff_hist")
Ratios all very high.
mckeon.brm3 %>% mcmc_plot(type = "combo", pars = pars)
mckeon.brm3 %>% mcmc_plot(type = "violin", pars = pars)
The rstan package offers a range of MCMC diagnostics. Lets start with the MCMC diagnostics.
Of these, we will focus on:
mckeon.brm3 %>% get_variables()
## [1] "b_Intercept"
## [2] "b_SYMBIONTcrabs"
## [3] "b_SYMBIONTshrimp"
## [4] "b_SYMBIONTboth"
## [5] "sd_BLOCK__Intercept"
## [6] "sd_BLOCK__SYMBIONTcrabs"
## [7] "sd_BLOCK__SYMBIONTshrimp"
## [8] "sd_BLOCK__SYMBIONTboth"
## [9] "cor_BLOCK__Intercept__SYMBIONTcrabs"
## [10] "cor_BLOCK__Intercept__SYMBIONTshrimp"
## [11] "cor_BLOCK__SYMBIONTcrabs__SYMBIONTshrimp"
## [12] "cor_BLOCK__Intercept__SYMBIONTboth"
## [13] "cor_BLOCK__SYMBIONTcrabs__SYMBIONTboth"
## [14] "cor_BLOCK__SYMBIONTshrimp__SYMBIONTboth"
## [15] "r_BLOCK[1,Intercept]"
## [16] "r_BLOCK[2,Intercept]"
## [17] "r_BLOCK[3,Intercept]"
## [18] "r_BLOCK[4,Intercept]"
## [19] "r_BLOCK[5,Intercept]"
## [20] "r_BLOCK[6,Intercept]"
## [21] "r_BLOCK[7,Intercept]"
## [22] "r_BLOCK[8,Intercept]"
## [23] "r_BLOCK[9,Intercept]"
## [24] "r_BLOCK[10,Intercept]"
## [25] "r_BLOCK[1,SYMBIONTcrabs]"
## [26] "r_BLOCK[2,SYMBIONTcrabs]"
## [27] "r_BLOCK[3,SYMBIONTcrabs]"
## [28] "r_BLOCK[4,SYMBIONTcrabs]"
## [29] "r_BLOCK[5,SYMBIONTcrabs]"
## [30] "r_BLOCK[6,SYMBIONTcrabs]"
## [31] "r_BLOCK[7,SYMBIONTcrabs]"
## [32] "r_BLOCK[8,SYMBIONTcrabs]"
## [33] "r_BLOCK[9,SYMBIONTcrabs]"
## [34] "r_BLOCK[10,SYMBIONTcrabs]"
## [35] "r_BLOCK[1,SYMBIONTshrimp]"
## [36] "r_BLOCK[2,SYMBIONTshrimp]"
## [37] "r_BLOCK[3,SYMBIONTshrimp]"
## [38] "r_BLOCK[4,SYMBIONTshrimp]"
## [39] "r_BLOCK[5,SYMBIONTshrimp]"
## [40] "r_BLOCK[6,SYMBIONTshrimp]"
## [41] "r_BLOCK[7,SYMBIONTshrimp]"
## [42] "r_BLOCK[8,SYMBIONTshrimp]"
## [43] "r_BLOCK[9,SYMBIONTshrimp]"
## [44] "r_BLOCK[10,SYMBIONTshrimp]"
## [45] "r_BLOCK[1,SYMBIONTboth]"
## [46] "r_BLOCK[2,SYMBIONTboth]"
## [47] "r_BLOCK[3,SYMBIONTboth]"
## [48] "r_BLOCK[4,SYMBIONTboth]"
## [49] "r_BLOCK[5,SYMBIONTboth]"
## [50] "r_BLOCK[6,SYMBIONTboth]"
## [51] "r_BLOCK[7,SYMBIONTboth]"
## [52] "r_BLOCK[8,SYMBIONTboth]"
## [53] "r_BLOCK[9,SYMBIONTboth]"
## [54] "r_BLOCK[10,SYMBIONTboth]"
## [55] "prior_Intercept"
## [56] "prior_b"
## [57] "prior_sd_BLOCK"
## [58] "prior_cor_BLOCK"
## [59] "lp__"
## [60] "accept_stat__"
## [61] "stepsize__"
## [62] "treedepth__"
## [63] "n_leapfrog__"
## [64] "divergent__"
## [65] "energy__"
pars <- mckeon.brm3 %>% get_variables()
pars <- str_extract(pars, "^b_.*|^sigma$|^sd.*") %>% na.omit()
mckeon.brm3$fit %>%
stan_trace(pars = pars)
The chains appear well mixed and very similar
mckeon.brm3$fit %>%
stan_ac(pars = pars)
There is no evidence of auto-correlation in the MCMC samples
mckeon.brm3$fit %>% stan_rhat()
All Rhat values are below 1.05, suggesting the chains have converged.
stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
mckeon.brm3$fit %>% stan_ess()
Ratios all very high.
mckeon.brm3$fit %>%
stan_dens(separate_chains = TRUE, pars = pars)
The ggmean package also as a set of MCMC diagnostic functions. Lets start with the MCMC diagnostics.
Of these, we will focus on:
## mckeon.ggs <- mckeon.brm3 %>% ggs(burnin = FALSE, inc_warmup = FALSE)
## mckeon.ggs %>% ggs_traceplot()
The chains appear well mixed and very similar
## ggs_autocorrelation(mckeon.ggs)
There is no evidence of auto-correlation in the MCMC samples
## ggs_Rhat(mckeon.ggs)
All Rhat values are below 1.05, suggesting the chains have converged.
stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
## ggs_effective(mckeon.ggs)
Ratios all very high.
## ggs_crosscorrelation(mckeon.ggs)
## ggs_grb(mckeon.ggs)
Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.
available_ppc()
## bayesplot PPC module:
## ppc_bars
## ppc_bars_grouped
## ppc_boxplot
## ppc_data
## ppc_dens
## ppc_dens_overlay
## ppc_dens_overlay_grouped
## ppc_ecdf_overlay
## ppc_ecdf_overlay_grouped
## ppc_error_binned
## ppc_error_hist
## ppc_error_hist_grouped
## ppc_error_scatter
## ppc_error_scatter_avg
## ppc_error_scatter_avg_vs_x
## ppc_freqpoly
## ppc_freqpoly_grouped
## ppc_hist
## ppc_intervals
## ppc_intervals_data
## ppc_intervals_grouped
## ppc_km_overlay
## ppc_loo_intervals
## ppc_loo_pit
## ppc_loo_pit_data
## ppc_loo_pit_overlay
## ppc_loo_pit_qq
## ppc_loo_ribbon
## ppc_ribbon
## ppc_ribbon_data
## ppc_ribbon_grouped
## ppc_rootogram
## ppc_scatter
## ppc_scatter_avg
## ppc_scatter_avg_grouped
## ppc_stat
## ppc_stat_2d
## ppc_stat_freqpoly_grouped
## ppc_stat_grouped
## ppc_violin_grouped
mckeon.brm3 %>% pp_check(type = "dens_overlay", nsamples = 100)
The model draws appear to be consistent with the observed data.
mckeon.brm3 %>% pp_check(type = "error_scatter_avg")
This is not really interpretable
mckeon.brm3 %>% pp_check(group = "BIRD", type = "intervals")
The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.
# library(shinystan)
# launch_shinystan(mckeon.brm2)
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
preds <- mckeon.brm3 %>% posterior_predict(nsamples = 250, summary = FALSE)
mckeon.resids <- createDHARMa(
simulatedResponse = t(preds),
observedResponse = mckeon$PREDATION,
fittedPredictedResponse = apply(preds, 2, median),
integerResponse = TRUE
)
plot(mckeon.resids, quantreg = TRUE)
Conclusions:
mckeon.brm3 %>%
conditional_effects() %>%
plot(points = TRUE)
mckeon.brm3 %>%
ggpredict() %>%
plot(add.data = TRUE, jitter = c(0.5, 0))
## $SYMBIONT
mckeon.brm3 %>%
ggemmeans(~SYMBIONT) %>%
plot(add.data = TRUE, jitter = c(0.5, 0))
Partial.obs <- mckeon.brm3$data %>%
mutate(
Pred = predict(mckeon.brm3, re.form = NA)[, "Estimate"],
Resid = resid(mckeon.brm3)[, "Estimate"],
Obs = Pred + Resid
)
mckeon.brm3 %>%
fitted_draws(newdata = mckeon, re_formula = NA) %>%
median_hdci() %>%
ggplot(aes(x = SYMBIONT, y = .value)) +
geom_pointrange(aes(ymin = .lower, ymax = .upper)) +
geom_line() +
geom_point(
data = Partial.obs, aes(y = Obs, x = SYMBIONT),
position = position_nudge(x = 0.1)
) +
geom_point(
data = mckeon, aes(y = PREDATION, x = SYMBIONT), alpha = 0.2,
position = position_nudge(x = 0.05)
)
brms captures the MCMC samples from stan within the returned list. There are numerous ways to retrieve and summarise these samples. The first three provide convenient numeric summaries from which you can draw conclusions, the last four provide ways of obtaining the full posteriors.
The summary() method generates simple summaries (mean, standard deviation as well as 10, 50 and 90 percentiles).
mckeon.brm3 %>% summary()
## Family: binomial
## Links: mu = logit
## Formula: PREDATION | trials(1) ~ SYMBIONT + (SYMBIONT | BLOCK)
## Data: mckeon (Number of observations: 80)
## Draws: 3 chains, each with iter = 1000; warmup = 200; thin = 5;
## total post-warmup draws = 480
##
## Group-Level Effects:
## ~BLOCK (Number of levels: 10)
## Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(Intercept) 2.27 1.86 0.06 6.54 1.00
## sd(SYMBIONTcrabs) 35.90 219.89 0.12 195.62 1.00
## sd(SYMBIONTshrimp) 6.09 14.76 0.07 28.86 1.00
## sd(SYMBIONTboth) 6.69 14.45 0.09 32.86 1.00
## cor(Intercept,SYMBIONTcrabs) 0.25 0.42 -0.67 0.89 1.00
## cor(Intercept,SYMBIONTshrimp) 0.16 0.42 -0.69 0.85 1.00
## cor(SYMBIONTcrabs,SYMBIONTshrimp) 0.42 0.41 -0.60 0.94 1.00
## cor(Intercept,SYMBIONTboth) 0.13 0.43 -0.70 0.86 1.00
## cor(SYMBIONTcrabs,SYMBIONTboth) 0.34 0.41 -0.59 0.90 1.00
## cor(SYMBIONTshrimp,SYMBIONTboth) 0.38 0.43 -0.64 0.93 1.00
## Bulk_ESS Tail_ESS
## sd(Intercept) 1006 2198
## sd(SYMBIONTcrabs) 1249 1995
## sd(SYMBIONTshrimp) 1091 2065
## sd(SYMBIONTboth) 1112 1902
## cor(Intercept,SYMBIONTcrabs) 1879 2026
## cor(Intercept,SYMBIONTshrimp) 1900 2283
## cor(SYMBIONTcrabs,SYMBIONTshrimp) 1497 2179
## cor(Intercept,SYMBIONTboth) 1803 2286
## cor(SYMBIONTcrabs,SYMBIONTboth) 1674 2162
## cor(SYMBIONTshrimp,SYMBIONTboth) 1207 2209
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 3.56 1.56 1.25 7.21 1.00 1579 1993
## SYMBIONTcrabs -2.01 4.19 -10.04 7.37 1.00 1696 2040
## SYMBIONTshrimp -3.31 2.76 -8.52 2.34 1.00 1874 1900
## SYMBIONTboth -5.20 2.89 -11.69 -0.31 1.00 2105 2311
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Conclusions:
mckeon.brm3$fit %>%
tidyMCMC(
estimate.method = "median",
conf.int = TRUE, conf.method = "HPDinterval",
rhat = TRUE, ess = TRUE
)
Conclusions:
see above
Due to the presence of a log transform in the predictor, it is better to use the regex version.
mckeon.brm3 %>% get_variables()
## [1] "b_Intercept"
## [2] "b_SYMBIONTcrabs"
## [3] "b_SYMBIONTshrimp"
## [4] "b_SYMBIONTboth"
## [5] "sd_BLOCK__Intercept"
## [6] "sd_BLOCK__SYMBIONTcrabs"
## [7] "sd_BLOCK__SYMBIONTshrimp"
## [8] "sd_BLOCK__SYMBIONTboth"
## [9] "cor_BLOCK__Intercept__SYMBIONTcrabs"
## [10] "cor_BLOCK__Intercept__SYMBIONTshrimp"
## [11] "cor_BLOCK__SYMBIONTcrabs__SYMBIONTshrimp"
## [12] "cor_BLOCK__Intercept__SYMBIONTboth"
## [13] "cor_BLOCK__SYMBIONTcrabs__SYMBIONTboth"
## [14] "cor_BLOCK__SYMBIONTshrimp__SYMBIONTboth"
## [15] "r_BLOCK[1,Intercept]"
## [16] "r_BLOCK[2,Intercept]"
## [17] "r_BLOCK[3,Intercept]"
## [18] "r_BLOCK[4,Intercept]"
## [19] "r_BLOCK[5,Intercept]"
## [20] "r_BLOCK[6,Intercept]"
## [21] "r_BLOCK[7,Intercept]"
## [22] "r_BLOCK[8,Intercept]"
## [23] "r_BLOCK[9,Intercept]"
## [24] "r_BLOCK[10,Intercept]"
## [25] "r_BLOCK[1,SYMBIONTcrabs]"
## [26] "r_BLOCK[2,SYMBIONTcrabs]"
## [27] "r_BLOCK[3,SYMBIONTcrabs]"
## [28] "r_BLOCK[4,SYMBIONTcrabs]"
## [29] "r_BLOCK[5,SYMBIONTcrabs]"
## [30] "r_BLOCK[6,SYMBIONTcrabs]"
## [31] "r_BLOCK[7,SYMBIONTcrabs]"
## [32] "r_BLOCK[8,SYMBIONTcrabs]"
## [33] "r_BLOCK[9,SYMBIONTcrabs]"
## [34] "r_BLOCK[10,SYMBIONTcrabs]"
## [35] "r_BLOCK[1,SYMBIONTshrimp]"
## [36] "r_BLOCK[2,SYMBIONTshrimp]"
## [37] "r_BLOCK[3,SYMBIONTshrimp]"
## [38] "r_BLOCK[4,SYMBIONTshrimp]"
## [39] "r_BLOCK[5,SYMBIONTshrimp]"
## [40] "r_BLOCK[6,SYMBIONTshrimp]"
## [41] "r_BLOCK[7,SYMBIONTshrimp]"
## [42] "r_BLOCK[8,SYMBIONTshrimp]"
## [43] "r_BLOCK[9,SYMBIONTshrimp]"
## [44] "r_BLOCK[10,SYMBIONTshrimp]"
## [45] "r_BLOCK[1,SYMBIONTboth]"
## [46] "r_BLOCK[2,SYMBIONTboth]"
## [47] "r_BLOCK[3,SYMBIONTboth]"
## [48] "r_BLOCK[4,SYMBIONTboth]"
## [49] "r_BLOCK[5,SYMBIONTboth]"
## [50] "r_BLOCK[6,SYMBIONTboth]"
## [51] "r_BLOCK[7,SYMBIONTboth]"
## [52] "r_BLOCK[8,SYMBIONTboth]"
## [53] "r_BLOCK[9,SYMBIONTboth]"
## [54] "r_BLOCK[10,SYMBIONTboth]"
## [55] "prior_Intercept"
## [56] "prior_b"
## [57] "prior_sd_BLOCK"
## [58] "prior_cor_BLOCK"
## [59] "lp__"
## [60] "accept_stat__"
## [61] "stepsize__"
## [62] "treedepth__"
## [63] "n_leapfrog__"
## [64] "divergent__"
## [65] "energy__"
mckeon.draw <- mckeon.brm3 %>%
gather_draws(`b.Intercept.*|b_SYMBIONT.*`, regex = TRUE)
mckeon.draw
We can then summarise this
mckeon.draw %>% median_hdci()
mckeon.draw %>%
mutate(.value = exp(.value)) %>%
median_hdci()
mckeon.brm3 %>%
gather_draws(`b_Intercept.*|b_SYMBIONT.*`, regex = TRUE) %>%
ggplot() +
geom_vline(xintercept = 0, linetype = "dashed") +
stat_slab(aes(
x = .value, y = .variable,
fill = stat(ggdist::cut_cdf_qi(cdf,
.width = c(0.5, 0.8, 0.95),
labels = scales::percent_format()
))
), color = "black") +
scale_fill_brewer("Interval", direction = -1, na.translate = FALSE)
mckeon.brm3 %>%
gather_draws(`.Intercept.*|b_SYMBIONT.*`, regex = TRUE) %>%
ggplot() +
geom_vline(xintercept = 0, linetype = "dashed") +
stat_halfeye(aes(x = .value, y = .variable)) +
theme_classic()
mckeon.brm3$fit %>% plot(type = "intervals")
This is purely a graphical depiction on the posteriors.
mckeon.brm3 %>% tidy_draws()
mckeon.brm3 %>% spread_draws(`.*Intercept.*|b_SYMBIONT.*`, regex = TRUE)
mckeon.brm3 %>%
posterior_samples() %>%
as_tibble()
mckeon.brm3 %>%
bayes_R2(re.form = NA, summary = FALSE) %>%
median_hdci()
mckeon.brm3 %>%
bayes_R2(re.form = ~ (1 | BLOCK), summary = FALSE) %>%
median_hdci()
mckeon.brm3 %>%
bayes_R2(re.form = ~ (SYMBIONT | BLOCK), summary = FALSE) %>%
median_hdci()
In addition to comparing each of the symbiont types against the control group of no symbionts, it might be interesting to investigate whether there are any differences between the predation protection provided by crabs and shrimp, as well as whether having both crabs and shrimp symbionts is different to only a single symbiont type.
These contrasts can be explored via specific contrasts.
| SYMBIONT | Crab vs Shrimp | One vs Both | None vs Symbiont |
|---|---|---|---|
| none | 0 | 0 | 1 |
| crab | 1 | 1/2 | -1/3 |
| shrimp | -1 | 1/2 | -1/3 |
| both | 0 | -1 | -1/3 |
mckeon.brm3 %>%
emmeans(~SYMBIONT, type = "response") %>%
pairs()
## contrast odds.ratio lower.HPD upper.HPD
## none / crabs 11.08 4.30e-09 2288
## none / shrimp 29.20 2.09e-08 1713
## none / both 135.89 4.93e-05 27612
## crabs / shrimp 2.89 0.00e+00 10545
## crabs / both 15.58 1.20e-09 181287
## shrimp / both 4.59 9.64e-08 2654
##
## Point estimate displayed: median
## Results are back-transformed from the log odds ratio scale
## HPD interval probability: 0.95
mckeon.em <- mckeon.brm3 %>%
emmeans(~SYMBIONT, type = "link") %>%
pairs() %>%
gather_emmeans_draws() %>%
mutate(PEff = exp(.value)) # ,
# Prob = plogis(.value))
mckeon.em %>% head()
mckeon.em %>%
group_by(contrast) %>%
dplyr::select(contrast, PEff) %>%
median_hdi()
mckeon.em %>%
group_by(contrast) %>%
summarize(Prob = sum(PEff > 1) / n())
mckeon.em <- emmeans(mckeon.brm3, ~SYMBIONT, type = "link") %>%
gather_emmeans_draws()
mckeon.em %>%
mutate(P = plogis(.value)) %>%
median_hdci(P)
cmat <- cbind(
crab_vs_shrimp = c(0, 1, -1, 0),
one_vs_both = c(0, 1 / 2, 1 / 2, -1),
symbiont = c(1, -1 / 3, -1 / 3, -1 / 3)
)
mckeon.em <- mckeon.brm3 %>%
emmeans(~SYMBIONT, type = "link") %>%
contrast(method = list(cmat)) %>%
gather_emmeans_draws() %>%
mutate(Fit = exp(.value))
mckeon.em %>% median_hdci(Fit)
mckeon.em %>%
group_by(contrast) %>%
median_hdi(Fit)
mckeon.em %>%
group_by(contrast) %>%
summarize(sum(Fit > 1) / n())
newdata <- emmeans(mckeon.brm3, ~SYMBIONT, type = "response") %>% as.data.frame()
head(newdata)
ggplot(newdata, aes(y = prob, x = SYMBIONT)) +
geom_pointrange(aes(ymin = lower.HPD, ymax = upper.HPD))
mckeon.brm3 %>% bayes_R2(re.form = NA)
## Estimate Est.Error Q2.5 Q97.5
## R2 0.2615737 0.0917948 0.04009156 0.3793881
mckeon.brm3 %>%
bayes_R2(re.form = NA, summary = FALSE) %>%
median_hdci()
mckeon.brm3 %>%
bayes_R2(re.form = ~ (1 | BLOCK), summary = FALSE) %>%
median_hdci()
mckeon.brm3 %>%
bayes_R2(re.form = ~ (SYMBIONT | BLOCK), summary = FALSE) %>%
median_hdci()